{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a6199b61-8084-4a2f-8488-2ddc4c260776",
   "metadata": {},
   "source": [
    "# Using Vertext AI\n",
    "\n",
    "Vertex AI offers everything you need to build and use generative AI—from AI solutions, to Search and Conversation, to 100+ foundation models, to a unified AI platform. You get access to models like PaLM 2 which can be used to score your RAG responses and pipelines with Ragas instead of the default OpenAI.\n",
    "\n",
    "This tutorial will show you can you can use PaLM 2 with Ragas for evaluation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79986e4d-17e4-4e88-ad2e-ca2b34a63de9",
   "metadata": {},
   "source": [
    ":::{Note}\n",
    "this guide is for folks who are using the Amazon Bedrock endpoints. Check the [evaluation guide](../../getstarted/evaluation.md) if your using OpenAI endpoints.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bd757ae-51ee-4527-b2fb-0d0b88285939",
   "metadata": {},
   "source": [
    "## Load Sample Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0d3e6c99-c19c-44a1-8f05-4bde2de30866",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b0c4c79a44334e1198e1b7283e4532b5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    baseline: Dataset({\n",
       "        features: ['question', 'ground_truths', 'answer', 'contexts'],\n",
       "        num_rows: 30\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# data\n",
    "from datasets import load_dataset\n",
    "\n",
    "amnesty_qa = load_dataset(\"explodinggradients/amnesty_qa\", \"english_v2\")\n",
    "amnesty_qa"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e67daaa-60e3-4584-8ec6-944c3c5a1a0c",
   "metadata": {},
   "source": [
    "```\n",
    "\n",
    "Now lets import the metrics we are going to use\n",
    "\n",
    "```python\n",
    "from ragas.metrics import (\n",
    "    context_precision,\n",
    "    answer_relevancy,  # AnswerRelevancy\n",
    "    faithfulness,\n",
    "    context_recall,\n",
    ")\n",
    "from ragas.metrics.critique import harmfulness\n",
    "\n",
    "# list of metrics we're going to use\n",
    "metrics = [\n",
    "    faithfulness,\n",
    "    answer_relevancy,\n",
    "    context_recall,\n",
    "    context_precision,\n",
    "    harmfulness,\n",
    "]\n",
    "```\n",
    "\n",
    "By default Ragas uses `ChatOpenAI` for evaluations, lets swap that out with `ChatVertextAI`. We also need to change the embeddings used for evaluations for `OpenAIEmbeddings` to `VertextAIEmbeddings` for metrices that need it, which in our case is `answer_relevancy`.\n",
    "\n",
    "Now in order to use the new `ChatVertextAI` llm instance with Ragas metrics, you have to create a new instance of `RagasLLM` using the `ragas.llms.LangchainLLM` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "09ef783b-12dd-40e8-bdbf-4744e41038dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import google.auth\n",
    "from langchain.chat_models import ChatVertexAI\n",
    "from ragas.llms import LangchainLLM\n",
    "from langchain.embeddings import VertexAIEmbeddings\n",
    "\n",
    "\n",
    "config = {\n",
    "    \"project_id\": \"tmp-project-404003\",\n",
    "}\n",
    "\n",
    "# authenticate to GCP\n",
    "creds, _ = google.auth.default(quota_project_id=\"tmp-project-404003\")\n",
    "# create Langchain LLM and Embeddings\n",
    "chat = ChatVertexAI(credentials=creds)\n",
    "vertextai_embeddings = VertexAIEmbeddings(credentials=creds)\n",
    "\n",
    "# create a wrapper around it\n",
    "ragas_vertexai_llm = LangchainLLM(chat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90e739a1-dbb4-42ce-adc2-b8cf88ae7a58",
   "metadata": {},
   "source": [
    "Now lets swap out the defaults with the VertexAI LLM and Embeddings we created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "280464c1-dbef-4200-85ec-dc2a4bffee14",
   "metadata": {},
   "outputs": [],
   "source": [
    "for m in metrics:\n",
    "    # change LLM for metric\n",
    "    m.__setattr__(\"llm\", ragas_vertexai_llm)\n",
    "\n",
    "    # check if this metric needs embeddings\n",
    "    if hasattr(m, \"embeddings\"):\n",
    "        # if so change with VertexAI Embeddings\n",
    "        m.__setattr__(\"embeddings\", vertextai_embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27a34b9d-796e-4025-ba8e-a799f6ab3c6b",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "\n",
    "Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5e739223-4e34-4dc0-9892-4625fdea7489",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluating with [faithfulness]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.53s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluating with [answer_relevancy]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.88s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluating with [context_recall]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.71s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluating with [context_precision]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.05it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluating with [harmfulness]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.02it/s]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'faithfulness': 1.0000, 'answer_relevancy': 0.9113, 'context_recall': 0.0000, 'context_precision': 0.0000, 'harmfulness': 0.0000}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from ragas import evaluate\n",
    "\n",
    "result = evaluate(\n",
    "    amnesty_qa[\"eval\"].select(range(1)),  # using 1 as example due to quota constrains\n",
    "    metrics=metrics,\n",
    ")\n",
    "\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "960f88fc-c90b-4ac6-8e97-252edd2f1661",
   "metadata": {},
   "source": [
    "and there you have the it, all the scores you need.\n",
    "\n",
    "now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "bc72c682-b0c0-4314-9da5-9b22aae722a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>contexts</th>\n",
       "      <th>answer</th>\n",
       "      <th>ground_truths</th>\n",
       "      <th>faithfulness</th>\n",
       "      <th>answer_relevancy</th>\n",
       "      <th>context_recall</th>\n",
       "      <th>context_precision</th>\n",
       "      <th>harmfulness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>How to deposit a cheque issued to an associate...</td>\n",
       "      <td>[Just have the associate sign the back and the...</td>\n",
       "      <td>\\nThe best way to deposit a cheque issued to a...</td>\n",
       "      <td>[Have the check reissued to the proper payee.J...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.911332</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                            question  \\\n",
       "0  How to deposit a cheque issued to an associate...   \n",
       "\n",
       "                                            contexts  \\\n",
       "0  [Just have the associate sign the back and the...   \n",
       "\n",
       "                                              answer  \\\n",
       "0  \\nThe best way to deposit a cheque issued to a...   \n",
       "\n",
       "                                       ground_truths  faithfulness  \\\n",
       "0  [Have the check reissued to the proper payee.J...           1.0   \n",
       "\n",
       "   answer_relevancy  context_recall  context_precision  harmfulness  \n",
       "0          0.911332             0.0                0.0            0  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = result.to_pandas()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c08f613d-880c-4e13-9716-05ce23fa4576",
   "metadata": {},
   "source": [
    "And thats it!\n",
    "\n",
    "if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
